PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition

نویسندگان

چکیده

Handwritten Chinese text recognition (HCTR) has been an active research topic for decades. However, most previous studies solely focus on the of cropped line images, ignoring error caused by detection in real-world applications. Although some approaches aimed at page-level have proposed recent years, they either are limited to simple layouts or require very detailed annotations including expensive line-level and even character-level bounding boxes. To this end, we propose PageNet end-to-end weakly supervised HCTR. detects recognizes characters predicts reading order between them, which is more robust flexible when dealing with complex multi-directional curved lines. Utilizing learning framework, requires only transcripts be annotated real data; however, it can still output results both character levels, avoiding labor cost labeling boxes Extensive experiments conducted five datasets demonstrate superiority over existing fully methods. These experimental may spark further beyond realms methods based connectionist temporal classification attention. The source code available https://github.com/shannanyinxiang/PageNet .

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present SEE, a step towards semi-supervised neural networks for scene text detection and recognition, that can be optimized end-t...

متن کامل

End-to-end weakly-supervised semantic alignment

We tackle the task of semantic alignment where the goal is to compute dense semantic correspondence aligning two images depicting objects of the same category. This is a challenging task due to large intra-class variation, changes in viewpoint and background clutter. We present the following three principal contributions. First, we develop a convolutional neural network architecture for semanti...

متن کامل

Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model

When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recogni...

متن کامل

Towards End-to-End Speech Recognition

Standard automatic speech recognition (ASR) systems follow a divide and conquer approach to convert speech into text. Alternately, the end goal is achieved by a combination of sub-tasks, namely, feature extraction, acoustic modeling and sequence decoding, which are optimized in an independent manner. More recently, in the machine learning community deep learning approaches have emerged which al...

متن کامل

End-to-End Text Recognition with Hybrid HMM Maxout Models

The problem of detecting and recognizing text in natural scenes has proved to be more challenging than its counterpart in documents, with most of the previous work focusing on a single part of the problem. In this work, we propose new solutions to the character and word recognition problems and then show how to combine these solutions in an end-to-end text-recognition system. We do so by levera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Computer Vision

سال: 2022

ISSN: ['0920-5691', '1573-1405']

DOI: https://doi.org/10.1007/s11263-022-01654-0